Single Channel Speech Enhancement Using Ideal Binary Mask Technique Based on Computational Auditory Scene Analysis
نویسندگان
چکیده
Single channel speech enhancement is necessary where the multichannel speech enhancement is not feasible due to space constraints in the intended device and cost-effectiveness. However, the problem of having limited information from single channel sound signal mixtures or unavailability of the speech source signals makes it more difficult to separate the target speech from the background maskers in the acoustic environment of low signal to noise ratio, in various background noises and in less temporal duration of speech signals. To address these problems, computational auditory analysis became popular from the last decade as a new concept for speech enhancement. In this paper, ideal binary mask which is inspired by the computational auditory analysis is used to analyze and synthesize the input speech signals and masker signals in the time-frequency domain, where all the signals usually overlap. Synthesized signals are evaluated for speech quality measurement in terms of segmental signal-to-noise ratio. This study uses Malay language based speech as input speech signals. These input speech signals vary in duration due to their word structure. Large crowd babble speech and two talker competing speech are employed as masker signals. The input signal-to-noise ratio is varied from -5 dB to +15 dB in steps of 5 dB to vary the difficulty level of acoustic environment. Results show that ideal binary mask algorithm reconstructs the target speech signals efficiently from the degraded and noisy speech signals. This is signified by the high segmental signal-to-noise ratio even in the lowest input signal-to-noise ratio. This type of high noise reduction is necessary to lessen the burden of elderly listener’s listening effort in noisy environment.
منابع مشابه
On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis
What is the computational goal of auditory scene analysis? This is a key issue to address in the Marrian information-processing framework. It is also an important question for researchers in computational auditory scene analysis (CASA) because it bears directly on how a CASA system should be evaluated. In this chapter I discuss different objectives used in CASA. I suggest as a main CASA goal th...
متن کاملA Mask Estimation Method Integrating Data Field Model for Speech Enhancement
In most approaches based on computational auditory scene analysis (CASA), the ideal binary mask (IBM) is often used for noise reduction. However, it is almost impossible to obtain the IBM result. The error in IBM estimation may greatly violate smooth evolution nature of speech because of the energy absence in many speech-dominated time-frequency (TF) units. To reduce the error, the ideal ratio ...
متن کاملA computational auditory scene analysis system for speech segregation and robust speech recognition
A conventional automatic speech recognizer does not perform well in the presence of multiple sound sources, while human listeners are able to segregate and recognize a signal of interest through auditory scene analysis. We present a computational auditory scene analysis system for separating and recognizing target speech in the presence of competing speech or noise. We estimate, in two stages, ...
متن کاملMask estimation incorporating time-frequency trajectories for a CASA-based ASR front-end
In this paper, we propose a mask estimation method for a computational auditory scene analysis (CASA) based speech recognition front-end using speech obtained from two microphones. The proposed mask estimation method incorporates the observation that the mask information should be correlated over contiguous analysis time frames and adjacent frequency channels. To this end, two different hidden ...
متن کاملA Dual-Microphone Speech Enhancement Algorithm for Close-Talk System
While human listening is robust in complex auditory scenes, current speech enhancement algorithms do not perform well in noisy environments, even close-talk system is used. This paper addresses the robustness in dual microphone embedded close talk system by employing a computational auditory scene analysis (CASA) framework. The energy difference between the two microphones is used as the primar...
متن کامل